-
Notifications
You must be signed in to change notification settings - Fork 33
enh(policy): add Gen AI policy to review guide #344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@@ -0,0 +1,10 @@ | |||
```markdown |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we are happy with this checklist, i'll open a pr to update our submission template as well.
* When you submit a package to pyOpenSci, please disclose any use of LLMs (Large Language Models) in your package’s generation by checking the appropriate boxes on our software submission form. Disclosure should generally include what parts of your package were developed using LLM tools. | ||
* Please also disclose this use of Generative AI tools in your package's `README.md` file and in any modules where generative AI contributions have been implemented. | ||
* We require that all aspects of your package have been reviewed carefully by a human on your maintainer team. Please ensure all text and code have been carefully checked for bias, bugs, and issues before submitting to pyOpenSci. | ||
* Your acknowledgment of using Generative AI will not impact the success of your submission unless you have blindly copied text and code into your package without careful review and evaluation of its accuracy, and for any systemic bias. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this can realistically be promised. I can imagine some reviewers would not want to participate in such a review. Imagine if you are asked to review a paper and they acknowledge that anonymous non-consenting students and a paid ghostwriter wrote portions of the paper, but the author has read it all so it's good to go. If you believe that constitutes research misconduct because it violates the policies of your university and of other journals that you typically review for (which may use COPE and CRediT).
You could say that reviewers assigned to the submission will have to promise it won't affect their review. (IMO, reviewers need to be aware of this and have an informed opportunity for choose whether to consent.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Maybe something like "use of generative AI, if disclosed, does not disqualify your package from review. Your disclosure is used to match your submission with reviewers with similar values and experience regarding gen AI. Incomplete disclosure of gen AI could affect your review success, as reviewers are volunteers and retain discretion to discontinue their review if this, or any other part of your submission appears inaccurate on closer inspection."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we want to match package with "reviewers who share your values". That (a) makes ii even harder to find reviewers (instead of asking anyone, I know need to ask first "Where do you stand o the copyright and LLM debate?") and (b) makes pyopensci fall into two parts the "PyOpenSci with AI friendly authors and reviewers" and the "PyOS with anti- LLM authors and reviewers".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also "where do you stand on publication ethics". Unpaid reviewing of scholarly work is service to the epistemic integrity of a community and knowledge. I think the motivations to review for pyOS are more akin to reviewing for scholarly journals than to conducting an external security audit (which is almost always a paid activity and has a narrow purpose).
To impose on reviewers a uniform policy that violations of publication ethics norms must not influence their review will drive those reviewers away. The remaining reviewers, being "LLM-friendly", may be prone to use LLMs to create the appearance that they have conducted a review. (This is a huge problem for CS conferences where chairs are desperate for reviewers and people would like the professional recognition of being a member of the program committee without doing the work.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, ya ^
We are responding to an ethical and practical problem that has been forced upon us, not creating one. If we do nothing then the submissions could be clogged with generated packages, anyone who has an ethical problem with that (selfishly, me) leaves. If we require disclosure, we allow people the opportunity akin to informed consent to choose how they contribute their time. If reviewers are fine with LLM-generated packages, great. If not, also fine, they can choose what to review. The standards of review don't speak to LLM generated code and should apply uniformly.
Similarly, from the readers PoV, if the reader is fine with LLM-generated packages, cool, that's still useful information. If the reader is not, also cool, they know what to avoid.
From the author PoV, if I am fine with LLM-Generated packages, fine, if I'm seeking external review I should feel comfortable being honest about how its development process works. If I am not fine with LLM-generated packages, I would not submit my work here without a thoughtful LLM policy because I would not want my work associated with what I believe to be unethical slop, and disclosure allows my work to remain separate.
Requiring disclosure expands the reviewer, submitter, and reader pools, not contracts them. I am not seeing an articulation of harms from having a nonuniform reviewer pool when the review standards are uniform and checked by an editor who is a third party to reviewer choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok this is also super helpful. I added this statement because @hamogu made a powerful and valid argument that some people will intentionally not disclose use. We could inadvertently penalize honest people with great intentions who are using these tools in productive and thoughtful ways.
I don't want to create a divide. I also don't want prejudice in our system where any package that has been supported by an LLM is rejected.
We can have opinions about these tools and still work together to make informed decisions about where to invest our time. If a reviewer finds that a module is poorly written / wrong copy pasta LLM output, they could opt to say "no thank you" or they could opt to ask the authors to rewrite, we pause the review and then they restart the review when fixes are made.
What about something like: (open to suggestions)
- Your acknowledgment of using Generative AI will not prejudice the success of your submission. However, a reviewer can and will ask you to revisit your package's content if it appears that section have been copied and pasted from other sources without human review.
- [ ] Some parts of the package were created using LLMs in some way. | ||
* Please check any of the boxes below to clarify which parts of the package were impacted by LLM use | ||
- [ ] LLMs were used to develop code | ||
- [ ] LLMs were used to develop documentation | ||
- [ ] LLMs were used to develop tests | ||
- [ ] LLMs were used to develop infrastructure (CI, automation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so i think checklist can be good for prompting on different things, but i do think we want to have this be a text response - "used to develop code" could be anything from "had tab autocompletion on and used it once" to "wholly generated by LLMs."
So maybe something like...
- Generative AI was used to produce some of the material in this submission
- If the above box was checked, please describe how generative AI was used, including
- Which parts of the submission were generated: e.g. documentation, tests, code. In addition to a general description, please specifically indicate any substantial portions of code (classes, modules, subpackages) that were wholly or primarily generated by AI.
- The approximate scale of the generated portions: e.g. "all of the tests were generated and then checked by a human," "small routines were generated and copied into the code."
- How the generative AI was used: e.g. line completion, help with translation, queried separately and integrated, agentic workflow.
- If generative AI was used, the authors affirm that all generated material has been reviewed and edited for clarity, concision, correctness, and absence of machine bias. The authors are responsible for the content of their work, and affirm that it is in a state where reviewers will not be responsible for primary editing and review of machine-generated material.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this direction, but I'm stuck on a lot of questions about this part:
- What does "authors are responsible for the content of their work" mean? Is that meant to be a statement of provenance or merely of agreement?
- What if the submission contains a page of code that matches verbatim a different package with an incompatible license/no attribution? Is that misconduct, something a reviewer would politely ask them to fix, or acceptable as-is if it is plausible to believe it was tab-completed instead of flagrantly copied? (In a journal context, that would involve rejection and possible reporting of misconduct to the authors' institutions.)
- What if that code was accepted in a PR from a minor contributor who is not an author on a paper? (Super common for projects with many contributors. You might have hundreds of minor contributors, but a dozen core developers and maintainers.) Is there an expectation that maintainers practice due diligence? (This is why I think DCO-style norms are so important.)
- If the "fix" remedy is chosen, does that whole part of the code need to be rewritten using clean-room methods or is it enough to change variable names so it no longer shows up as a verbatim match?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "authors are responsible for the content of their work" mean? Is that meant to be a statement of provenance or merely of agreement?
it's a clarification within the affirmation that "the authors have reviewed and stand behind their work." regardless of provenance, the authors are the ones that are submitting the thing to be reviewed, and they are responsible for the content of the thing being reviewed.
this is meant to address this:
What if that code was accepted in a PR from a minor contributor who is not an author on a paper?
The people that are engaging in the review process take responsibility for the material they are asking to be reviewed.
What if the submission contains a page of code that matches verbatim a different package with an incompatible license/no attribution?
I think this is important but should probably be a separate item, eg. in this comment "we currently ask authors to write something about the state of the field of neighboring packages, ... if reviewers have generated some substantial part of their package that could have conceivably been "inspired by"/copied from another existing package, ask if they have searched for related implementations, and write something short about why not use that, and if the code does appear to overlap substantially, add some attribution. ... "
it might be a pretty high bar, and i would definitely be open to disagreement on it, because i both agree that we should encourage people being responsible with provenance, but also we can't ask someone to chase down the provenance of an undefinably small chunk of code. i again think in terms of facilitating review not what i would think are optimal development practices - "if there was a whole module of generated code that probably drew from a popular package, what would a reviewer need to know, and what is a standard we could expect from an author regarding code duplication from LLM code generation"
if the "fix" remedy is chosen
i would think attribution would be the preferred remedy here, but i also don't think that needs to be prescribed in the policies and can be a matter of negotiation between reviewers, editor, and author.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Attribution is only a "fix" if the licenses are compatible.
More generally, I'm worried that we're losing a culture (admittedly very inconsistently-practiced) of due diligence and replacing it with one in which everyone has plausible deniability, which is then vulnerable to DOS attack by dishonest academic-bean-farmers (and undermines the legitimacy of the entire JOSS/pyOS/+ community). If adequate redress for plagiarism in the scholarly literature was blame-free addition of the citation, it would be way more prevalent and the job of reviewer would be way more fraught and unappealing (and biased outcomes would proliferate). The system is built on presumption of good faith, with exceptions being scarce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a happy middle ground could be an optional text-box?
I don't think many folks are currently tracking where/when they are using LLMs in a systematic or auditable way, or I would think its rare. You may however recall wanting to auto generate documentation explaining your functions.
I think the next item on the list with a human acknowledgement also sets this expectation and accountability for submitters too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A text-box could ask to describe (or link) the project's policies on use of LLM products and what processes/due diligence it conducts to ensure provenance, attribution, and license compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey everyone 👋🏻 I'm checking back in on this PR. There is some great discussion here, and I agree that having that checklist seems like too much and a hard to answer / track item as Lauren points out. I am interested in how people are using these tools in their development. Maybe I could add that to our presubmission survey (as an optional question).
We are not a journal, so it is not our job to focus on publication-quality work per se. And it's possible that we have reviewed packages where code was copied from Stack Overflow or another package; we wouldn't necessarily know unless someone recognized it during the review process. At the same time, JOSS could publish via our review, so they expect a high-quality review; however, JOSS has minimal direction regarding general AI right now. We have a mission to educate maintainers and support maintainers as the ecosystem evolves. As such, these check boxes also serve to raise awareness.
I also like what @sneakers-the-rat laid out above to simplify, and Jed, as always, I love the thoughtful consideration of how these tools impact peer review. Let's focus on protecting our reviewers and raising awareness of some of the issues here. I'm incorporating some of Jonny's suggestions above, along with a few other ideas below.
What do you all think about this as a modification?
- [ ] Generative AI was used to produce some of the material in this submission
- [ ] If generative AI was used in this project, the authors affirm that all generated material has been reviewed and edited for clarity, concision, correctness, and absence of machine bias.
If you have a policy around generative AI submissions, please provide a link to it below:
* Your Link Here <~- remove this line if you don't have a link->
The authors are responsible for the content of their work, and affirm that it is in a state where reviewers will not be responsible for primary editing and review of machine-generated material.
Jonny, I am not exactly sure where that last part (I blockquoted the text above ) . Is this a way of addressing the responsibility item that Jed mentioned above, considering they likely DO have AI contributions that they don't even know about if they have a large contributor base?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think many folks are currently tracking where/when they are using LLMs in a systematic or auditable way, or I would think its rare.
yeah, agreed, i think leaving it as an open-ended thing allows for that: you might not remember exactly what was generated or not, but you can probably say "i used the llm to generate initial class skeletons in most of the modules..." or something that captures the gist of the usage.
I believe strongly that it is not overburdensome to ask people to write a few sentences on the tools they used. I think a checkbox without any further explanation isn't really all that informative since the range of what "i used llms" is so wide. If they have a policy that captures how the authors used LLMs during creation of the package, that's fine, and they can include that here. However I think that we do not accomplish the goal of allowing reviewers to know what they are reviewing with a simple checkbox and an optional link to a policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the checkboxes are both not enough detail and may not be possible to honestly affirm.
For any bigger project (i.e., one in which not all contributors are authors), the authors can't really claim due diligence beyond project policies and maintainer procedures. Even if you just have a dozen authors whose LLM use may have evolved over time, honestly detailing those practices is a big task. I think it makes sense to ask for authors to explain their policies and procedures, and (for large projects) what they believe contributors are doing based on the maintainers' experience reviewing and interacting with contributors.
I don't think a maintainer can honestly check the second box unless the project has clear guidelines and believes it can trust everyone to abide by those guidelines. Even as an individual sole-author, what does it mean to say content is free from bias? Bias is the natural state of technology, and that applies also to people (though "AI" automates it in various ways that the human may or may not understand). Similarly, what would it mean to affirm that no LLM-generated content constitutes plagiarism? Such questions are incredibly fraught if you engage more deeply with what plagiarism means. The practical place to audit is at the process, not retrospective assessment of the artifact. I feel like asking people who are thinking critically to make such affirmations projects a shallow understanding from pyOpenSci and encourages people to be simplistic rather than critical in their responses. (And that carries over to practices outside the context of the review.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a practical resolution is to just require a free text field that describes, to the best of the authors ability, the use of LLMs to produce the thing that is being reviewed. in the case of a big package with lots of external authors, they could just write "there are lots of external contributions and we have no idea how they were developed. we manually reviewed all those, here is a link to our contributing docs and maybe it even has a section on LLM usage in it." Then they could additionally describe the things the authors do know about like their own usage of llms. We are asking for authors to describe how the work was created so that reviewers can understand what they are reviewing. If a reviewer doesn't want to review something that doesn't have a strict llm policy (or any at all), then that's fine. If a reviewer is fine with that, that's also fine.
I think the question of auditing process is a separate question to this - e.g. like how we require having docs and tests, we could require having a contributing policy that addresses LLMs. I don't necessarily think we should require that (none of my projects have an LLM policy), just saying that it sounds like a separate question: the purpose of the statement in the submission is to allow reviewers to know what they are reviewing, and any other requirements on the content of the submission are, imo, a different thing than the disclosure requirement.
re: the second checkbox, i think a minor revision to add "... has been reviewed and edited by the authors for ..." . I would probably add a preface like "The authors affirm that they are responsible for the content of their submission, and have taken appropriate measures to minimize reviewer labor..." before the specific requirements so that it's clear what the affirmation means. again, like with the disclosure, we are not asking the authors to be experts in what plaigiarism or bias means, but give the reviewers something to point to like "hey you said you reviewed this beforehand, and yet here is a bunch of dead code/a section with extremely biased outcomes/etc. so you didn't live up to the things i believed to be true when i agreed to review"
|
||
### Disclosure of generative AI use in pyOpenSci reviewed packages | ||
|
||
* When you submit a package to pyOpenSci, please disclose any use of LLMs (Large Language Models) in your package’s generation by checking the appropriate boxes on our software submission form. Disclosure should generally include what parts of your package were developed using LLM tools. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
along with above, checking boxes -> describing your use of generative AI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See what you think of my edits below.
The policy below was co-developed by the pyOpenSci community. Its goals are: | ||
|
||
* Acknowledgment of and transparency around the widespread use of Generative AI tools (with a focus on Large Language Models (LLMs) in Open Source. | ||
* Protect peer review efficiency: Ensure human review of any LLM-generated contributions to a package to protect editor and reviewer volunteer time in our peer review process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a little confusing - we want to avoid pyOS reviewers being the first ones to review LLM-generated code, so by "human review" we mean "prior author review." so maybe something like "Ensure an equitable balance of labor, where authors have ensured that generated material is in a state that minimizes review time, and reviewers are not responsible for correcting errors and unclarity from machine generated code. The PyOS review process should not be used as a mechanism for outsourcing human review of generated code."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally - let me try to turn your comment into an inline edit. I agree it's confusing as written but the goal is definitely equitable balance of labor - authors need to carefully review FIRST . i like that language
our-process/policies.md
Outdated
|
||
The policy below was co-developed by the pyOpenSci community. Its goals are: | ||
|
||
* Acknowledgment of and transparency around the widespread use of Generative AI tools (with a focus on Large Language Models (LLMs) in Open Source. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
along with this - allow reviewers to make informed decisions about what they choose to review, and allow authors to have reviewers that align with their values and practices. facilitate a review process that aligns with the ethics and values of its participants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate this. let me adjust this section
Co-authored-by: Jed Brown <[email protected]>
The policy below was co-developed by the pyOpenSci community. Its goals are: | ||
|
||
* Acknowledgment of and transparency around the widespread use of Generative AI tools with a focus on Large Language Models (LLMs) in Open Source. | ||
* Protect peer review efficiency: Ensure human review of any LLM-generated contributions to a package to protect editor and reviewer volunteer time in our peer review process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Protect peer review efficiency: Ensure human review of any LLM-generated contributions to a package to protect editor and reviewer volunteer time in our peer review process. | |
* Ensure an equitable balance of effort in the peer review process. Authors acknowledge that a human has carefully reviewed parts of the package that are AI-generated. Generated material should be in a state that minimizes review time. Our reviewers are not responsible for correcting errors in machine-generated content. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another comment Jonny wrote that I want to capture, which is in the main body of this review. I copied it below
allow reviewers to make informed decisions about what they choose to review, and allow authors to have reviewers that align with their values and practices. facilitate a review process that aligns with the ethics and values of its participants.
I agree with Moritz that it is VERY hard to find reviewers right now so values alignment will be tricky. However, I think allowing a reviewer to decline reviewing on the basis of LLM-generated content is entirely acceptable. And if they look at the issue submission, they will see that LLMs are involved and how.
Should we add a statement to our reviewer guide about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes - and in general we could remind reviewers that they are volunteering and can withdraw their review for whatever reason. this is sort of obvious by the nature of volunteer review, but just want to ensure reviewers like "you are in control of your time and we are glad to have it, but don't want to ask you to do something you are not comfortable with"
|
||
### Disclosure of generative AI use in pyOpenSci reviewed packages | ||
|
||
* When you submit a package to pyOpenSci, please disclose any use of LLMs (Large Language Models) in your package’s generation by checking the appropriate boxes on our software submission form. Disclosure should generally include what parts of your package were developed using LLM tools. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* When you submit a package to pyOpenSci, please disclose any use of LLMs (Large Language Models) in your package’s generation by checking the appropriate boxes on our software submission form. Disclosure should generally include what parts of your package were developed using LLM tools. | |
* When you submit a package to pyOpenSci, please disclose any use of LLMs (Large Language Models) in your package’s generation by checking the appropriate boxes and describing your use of generative AI in it's development and/or maintenance. Disclosure should include what parts of your package were developed using Generative AI (LLMs). |
* Raise awareness of challenges that Generative AI tools present to the scientific (and broader) open source community. | ||
[Please see this GitHub issue for a discussion of the topic.](https://github.com/pyOpenSci/software-peer-review/issues/331) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Raise awareness of challenges that Generative AI tools present to the scientific (and broader) open source community. | |
[Please see this GitHub issue for a discussion of the topic.](https://github.com/pyOpenSci/software-peer-review/issues/331) | |
* Disclosure further allows reviewers and editors to make conscious decisions around the types of packages that they wish to review code for. | |
* Raise awareness of challenges that Generative AI tools present to the scientific (and broader) open source community. | |
[Please see this GitHub issue for a discussion of the topic.](https://github.com/pyOpenSci/software-peer-review/issues/331) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sneakers-the-rat, please have a look at this - it may not quite be the right language. But essentially the idea here is that if a reviewer starts to review a package and they have strong opinions about LLM use in open source, then they get to a module that is clearly LLM output copy-pasta. That could be aggravating to them and it could also bias the review. So, disclosure essentially not only protects people's time, but it also helps create a more equitable and value-aligned process (I think).
Ok friends. I've suggested a few modifications following the comments. Please let me know your thoughts. I like where this is going!! |
This policy was developed based on a conversation here:
#331
I think that what we should do is review this policy and also consider linking to a blog post that covers some of our broad concerns and reasons for developing such a policy.